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<^ ABSTRACT 

Many models have been proposed to generate Internet Au- 
I ,' tonomous System (AS) topologies, most of which make struc- 
^^ ' tural assumptions about the AS graph. In this paper we com- 
pare AS topology generation models with several observed 
AS topologies. In contrast to most previous works, we avoid 
making assumptions about which topological properties are 
important to characterize the AS topology. Our analysis 
shows that, although matching degree-based properties, the 
existing AS topology generation models fail to capture the 
complexity of the local interconnection structure between 
ASs. Furthermore, we use BGP data from multiple vantage 
points to show that additional measurement locations signif- 
icantly affect local structure properties, such as clustering 
and node centrality. Degree-based properties, however, are 
not notably affected by additional measurements locations. 
These observations are particularly valid in the core. The 
shortcomings of AS topology generation models stems from 
an underestimation of the complexity of the connectivity in 
the core caused by inappropriate use of BGP data. 



Categories and Subject Descriptors 

C.2.1 [Network Architecture and Design]: Net- 
work topology; 1.6.4 [Simulation and Modeling]: Model 
Validation and Analysis 

General Terms 
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1. INTRODUCTION 

For many years researchers have modeled the Inter- 
net's Autonomous System (AS) topologjo using graphs 
obtained via various measurement techniques, e.g. BGP 
routing tables [ini [5S] and traceroute maps [Hj . The 
AS topology is an abstraction of the Internet which is 
commonly used to analyze its characteristics and sim- 
ulate the performance and scalability of new protocols 
and applications. Simulation methods require that AS 
topology generation models be able to provide topolo- 
gies whose properties are as close as possible to those 
of the observed AS topology. 

In this paper we evaluate existing AS topology gener- 
ation models by comparing them with several available 
datasets, representing observed AS topologies of the In- 
ternet. Figure [1] illustrates the relationship between the 
Internet topology, its measurement instances, and AS 
topology generation models. 

A key principle underlying our work is to be agnostic 
about the topological properties of the Internet. The 
main reason for our agnosticism lies in the dynamic be- 
havior of the Internet topology. In addition, observa- 
tions of the AS topology suffer from two problems. On 
the one hand, common set of observation points have 
only limited visibility of the topology [26 . On the other 
hand, each observation technique suffers from measure- 



^Note that the AS topology neither represents the data- 
plane topology nor directly corresponds to the Internet 
router-level topology. Many organizations are permanently 
connected to their providers, sharing an AS number 29 . Al- 
ternately, a single organization may use many AS numbers 
for controlling routing. 
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Figure 1: Internet topology generation 



ment artifacts. This results in problems for BGP-based 
as well as traceroute-based observations of the Internet 
topology. For example, traceroute can report hops that 
do not map to a unique AS number [22]. As a result, 
AS topology models make use of simplifying assump- 
tions about the actual topology [191 El] ■ One widely 
held assumption, based on biased observations, is that 
the AS topology has a hierarchical structure [3D] and 
its node-degree distribution obey a power-law [12j . 

Believing that at present it is impossible to know bet- 
ter, we accept the fact that the AS topology observa- 
tions suffer from biases and thus reveal different partial 
truths about the properties of the Internet. However, 
comparison of different observed AS topologies with dif- 
ferent levels of incompleteness, and topologies generated 
from different models, allows us to learn from the lim- 
itations of particular assumptions about the Internet's 
AS topology. Then, the direction of these biases and 
limitations may gives us insight into the actual proper- 
ties of the AS topology. 

To evaluate AS topology generation models, we rely 
on a wide set of commonly used topological metrics. We 
do not claim that the set of considered metrics captures 
all important aspects of the AS topology. However, us- 
ing such an extensive set of topological metrics allows 
us to observe differences in the so far revealed topolog- 
ical properties of observed and sythetic AS topologies. 
Futhermore, we rely on statistical measures to compare 
distributions of some metrics, allowing us to measure 
more objectively the similarity of two topologies. 

In this paper we show that the existing topology gen- 
erators capture the node degree distributions quite well, 
but fail to account either for the complex local intercon- 
nection structure between ASes, or the highly meshed 
structure of the core AS topology. Such shortcomings 
can affect the performance of protocols and applications 
when simulated using synthetic topologies. We also 
show that, using additional BGP peering vantage points 
for collecting connectivity information, does greatly af- 
fect important characteristics, such as power-laws and 
measures of centrality, while having little effect on basic 



degree-related properties. These observations suggest 
that for understanding the nature of the Internet topol- 
ogy one should use rich(er) datasets, which capture a 
large portion of existing peering links. Moreover, they 
show that the significance of preferential-attachment 
has waned while peering links, underestimated in the 
past, are now far more important. 

The rest of this paper is structured as follows. In 
Section [2] we contrast our work with the related work. 
We then introduce the existing AS topology generation 
models and describe their underlying assumptions in 
Section [3l and present a set of observed AS topolo- 
gies collected using different methodologies from vari- 
ous locations in the world in Section [U Subsequently, 
in Section [5] we describe the metrics used for topology 
characterization and discuss the statistical measures of 
similarity in Section [6] In Section [T] we present the 
results of our comparison analysis. Due to the discov- 
ery that synthetic and observed topologies record biases 
related to the nature of the data collection processes, 
we conduct an extensive analysis of the impact of in- 
creasing the number of BGP peering vantage points on 
our topology dataset, collected from a large number of 
measurement locations. This study is presented in Sec- 
tion [Sj Finally, in Section [9] we conclude and discusses 
potential improvements in the field of AS topology mod- 
eling. 

2. RELATED WORK 

Zegura et al. [35] analyse topologies of 100 nodes gen- 
erated using pure-random, Waxman [32], exponential 
and several locality based models of topology such as 
Transit-Stub. They use metrics such as average node 
degree, network diameter, number of paths between 
nodes. They find that pure random topologies repre- 
sent expected properties such as locality very poorly 
and so we exclude them from our comparison. They 
suggest that the Transit-Stub method should be used 
due both to its efficiency and the realistic average node 
degree its topologies achieve. 

Faloutsos et al. [12] state that three specific proper- 
ties of the Internet AS topology are well described by 
power laws: rank exponent, out-degree exponent and 
eigen-exponent (graph eigenvalues). This work paral- 
lelled development of many models incorporating power 
laws, such as the Barabasi and Albert [3] model, based 
on incremental growth by addition of new nodes and 
preferential attachment of new nodes to existing well- 
connected nodes. 

Later, Bu and Towsley [^ used the empirical com- 
plementary distribution (BCD) rather than standard 
histograms to generate new nodes. They showed the 
variability in graphs from different generators using the 
same heuristics using characteristic path length and clus- 
tering coefficients. 



Tangmunarunkit et al. |31| provide a first comparison 
of the underlying characteristics of degree-based models 
against structural models. A major conclusion is that 
the the simplest form of degree-based model performs 
better than random or structural models at represent- 
ing the studied parameters. They compare three cat- 
egories of model generators: Waxman, Tiers [10] and 
the Transit-Stub structural model, against the simplest 
degree based generator, the power-law random graph 
(PLRG) [1]. They define and use three metrics: ex- 
pansion, resilience and distortion. They find that the 
PLRG matches these metrics better than the random 
or structural models. Based on these metrics they con- 
clude that stricter hierarchy is present in the measured 
networks than in degree-based generators. However, 
they leave many questions unanswered about the accu- 
racy of degree-based generators and the choice of met- 
rics. 

Zhou and Mondragon [37] propose models based on 
several mathematical features, such as rich-club, inter- 
active growth and betweenness centrality. They use AS 
data from the CAIDA Skitter project to examine the 
Joint Degree Distribution (JDD) and rich-club connec- 
tivity. They show that for these data, rich-club connec- 
tivity and the JDD are closely linked for a network with 
a given degree distribution. 

In this paper, we consider more recent degree-based 
generators using a larger set of graph-theory derived 
metrics to give better insight into correct understand- 
ing of the AS topology. We compare in detail against 
a range of different Internet AS topologies at national 
and international level obtained from traceroute and 
BGP data. When choosing our metrics, we considered 
both metrics used by the topology generator designers 
and those used more widely in graph theory. A par- 
ticular point to note is that we chose not to use the 
three metrics of Tangmunarunkit et al. for two reasons. 
First, computation of both resilience and distortion are 
NP-complete, requiring use of heuristics. In contrast, 
all our metrics are straightforward to compute directly. 
Second, although accurate reproduction of degree-based 
metrics is well-supported by current topology genera- 
tors, our hypothesis was that local interconnectivity was 
poorly supported, and so we chose to use several metrics 
that focus on exactly this, e.g., assortativity, clustering, 
and centrality. 

3. AS TOPOLOGY MODELS 

There are many models available that claim to de- 
scribe the Internet AS topology. Several of these models 
are embodied in tools for generating simulated topolo- 
gies [IS]. In this section we describe the particular 
models whose output we compare in this paper. The 
first are produced from the Waxman model [32], de- 
rived from the Erdos-Renyi random graphs [llj , where 



the probability of two nodes being connected is propor- 
tional to the Euclidean distance between them. The 
second come from the Barabasi and Albert [3] model, 
following measurements of various power laws in de- 
gree distributions and rank exponents by Faloutsos et 
al. [12j . These incorporate common beliefs about pref- 
erential attachment and incremental growth. The third 
are from the Generalized Linear Preference model [5] 
which additionally model clustering coefficients. Fi- 
nally, Inet ^3] and PFP [37] focus on alternative AS 
topology characteristics: the meshed core and preferen- 
tial attachment respectively. Each model focused only 
on particular metrics and parameters, and only com- 
pared their output with selected AS topology observa- 
tions. 

3.1 Waxman 

The Waxman model of random graphs is based on 
a probability model for interconnecting nodes of the 
topology given by: P{u,v) = a e~'^/'^^^\ where < 
a, /9 < 1, d is the Euclidean distance between two nodes 
u and V, and L is the network diameter (largest distance 
between two nodes). We use the BRITE ^ implemen- 
tation of this model in this paper, which facilitates re- 
wiring using iterative assignment of edges to ensure that 
there are no disconnected components in the generated 
topology. 

3.2 BA 

The BA model was inspired by the idea of prefer- 
entially attaching new nodes to existing well-connected 
nodes, leading to the incremental growth of nodes and 
the links between them. When a node i joins the net- 
work, the probability that it connects to a node j al- 
ready in the network is given by: P{i,j) — — — 

where dj is the degree of node j, F is the set of nodes 
that have joined the network and J2k&v "^^ ^^ ^^^ ^^™ 
of degrees of all nodes that previously joined the net- 
work [53]. 

3.3 GLP 

Our third model is the Generalized Linear Preference 
model (GLP) [5]. This focuses on matching character- 
istic path length and clustering coefficients. It uses a 
probabilistic method for adding nodes and links recur- 
sively while preserving selected power law properties. 

3.4 Inet 

Inet '33] produces random networks using a preferen- 
tial linear weight for the connection probability of nodes 
after modelling the core of the generated topology as a 
full mesh network. Inet sets the minimum number of 
nodes at 3037, the number of ASs on the Internet at the 
time of its development. It similarly sets the fraction 



of degree 1 nodes to 0.3, based on measurements from 
RouteviewiH and NLANrI BGP table data in 2002. 

3.5 PFP 

In the Positive Feedback Preference (PFP) model, the 
AS topology of the Internet is considered to grow by in- 
teractive, probabilistic addition of new nodes and links. 
It uses a nonlinear preferential attachment probability 
when choosing older nodes for the interactive growth of 
the network, by inserting edges between existing nodes 
as well as the newly added ones. 

4. AS TOPOLOGY OBSERVATIONS 

The Internet AS topology can be inferred from var- 
ious sources of data such as BGP or traceroute ^21] at 
the network (IP) layer. Using BGP routing data alone 
suffers from incompleteness, no matter how many van- 
tage points are used to collect observations. In par- 
ticular, even if BGP updates are collected from mul- 
tiple vantage points and combined, many peering and 
sibling relationships are not observed 13j . Conversely, 
traceroute data misses alternative paths since routers 
may have multiple interfaces which are not easily iden- 
tified, and multi-hop paths may also be hidden by traffic 
tunnelled via Multi-Protocol Label Switching (MPLS). 
Combining these data sources still does not solve all 
problems since mapping traceroute data to AS num- 
bers is not always accurate [221 . In this paper we at- 
tempt to avoid these problems by comparing against 
many measurement-derived datasets giving a diverse 
spatial and temporal comparison across different con- 
tinents and years of measurement. 

4.1 Chinese 

The first dataset is a traceroute measurement of the 
Chinese AS Topology collected from servers within China 
in May 2005. It reports 84 ASs, representing a small 
subgraph of the Internet. Zhou et al. 08] maintain that 
the Chinese AS graph presents all the major topology 
characteristics of the global AS graph. The presence 
of this dataset enables us to compare the AS topology 
models at smaller scales. Further, this dataset is be- 
lieved to be nearly complete, i.e., it contains very little 
measurement bias and accurately represents the true AS 
topology for that region of the Internet. Thus, although 
it is rather small, we have included it as a valuable com- 
parison point in our studies. 

4.2 Skitter 

The second dataset comes from the CAIDA Skitter 
projecQ. CAIDA computes the adjacency matrix of 
the AS topology from the daily Skitter measurements. 



These are obtained by running traceroutes over a large 
range of IP addresses and mapping the prefixes to AS 
numbers using RouteViews BGP data. Since this data 
reports paths actually taken by packets, rather than 
path information propagated via BGP, it more directly 
represents the IP topology than the BGP data alone. 
For our study, we used the graphs for March 2004 as 
used by Mahadevan et al. [20], which reports 9,204 
unique ASs. 

4.3 RouteViews 

The third dataset we use is derived from the Route- 
Views BGP data. This is collected both as static snap- 
shots of the BGP routing tables and dynamic BGP data 
in the form of BGP update and withdrawal messages. 
We use the topologies provided by Mahadevan et al. [20] 
from two types of BGP data from March 2004: one from 
the static BGP tables and one from the BGP updates. 
In both cases, they filter AS-sets and private ASs and 
merge the 31 daily graphs into one. This dataset re- 
ports 17,446 unique ASs across 43 vantage points in 
the Internet. 

4.4 UCLA 

The fourth dataset comes from the Internet topology 
collectiorO maintained by Oliviera et al. [57]. These 
topologies are updated daily using the data sources such 
as BGP routing tables and updates from RouteViews, 
RIPEO, AbilenqJ and LookingGlass servers. Each node 
and link is annotated with the times it was first and 
last observed. We use a snapshot of this dataset from 
November 2007 computed using a time window on the 
last-seen timestamps to discard ASs which have not 
been seen for more than 6 months. The resulting dataset 
reports 28,899 unique ASs. 

5. TOPOLOGY CHARACTERIZATION 

Over the past several years a veriety of topological 
metrics has been proposed to quantitatively character- 
ize topological properties of networks. In this section 
we present a large set of topological metrics that will be 
used to measure a distance in graph space, i.e. how dis- 
tant two graphs are topologically from each other. The 
topological metrics are computed for the synthetic and 
the measured AS topologies. Taken individually, these 
metrics concentrate on differing topological aspects but 
when considered together they reveal the shortcomings 
of topology models to faithfully capture the topologi- 
cal properties of observed AS topologies. AS topologies 
are modeled as graphs G = (TV, C) with a collection of 
nodes Af and a collection of links C that connect a pair 



http : //miv . routeviews . org/ 1 
"http: //www. nlcmr.net/ 

4 I 1 

http : //www . caida . org/tools/measurement/Skitter/ | 



"http: //irl . cs .ucla.edu/topology/ 
http: //www. ripe . net /db/irr .html 
' http : //abilene . internet2 . eduT] 



of nodes. The number of nodes and links in a graph is 
then respectively equal to TV = \Af\ and M — \C\. 

5.1 Degree 

The degree k oi a node is the number of links ad- 
jacent to it. The average node degree k is defined as 
k = 2M/N. The node degree distribution P{k) is the 
probability that a randomly selected node has a given 
degree k. The node degree distribution is defined as 
P{k) = n{k)/N where n{k) is the number of nodes of 
degree k. The joint degree distribution (JDD) P{k, k') 
is the probability that a randomly selected pair of con- 
nected nodes have degrees k and k' . A summary mea- 
sure of the joint degree distribution is the average neigh- 
bor degree of nodes with a given degree k, and is de- 
fined as follows knn{k) = X]fe"=i k'P(k'\k). The max- 
imum possible knn{k) value is iV — f for a maximally 
connected network, i.e. a complete graph. Hence, we 
represent JDD by the normalized value knn{k)/{N — 1) 
[20j and refer to it as average neighbor connectivity. 

5.2 Assortativity 

Assortativity is a measure of the likelihood of con- 
nection of nodes of similar degrees [25] . This is usually 
expressed by means of the assortativity coefficient r: 
assortative networks have r > (disassortative have 
r < resp.) and tend to have nodes that are connected 
to nodes with similar (dissimilar resp.) degree. 

5.3 Clustering 

Given node i with ki links, these links could be in- 
volved in at most ki{ki — l)/2 triangles (e.g. nodes 
a —> b ^ c —> a form a triangle). The greater the num- 
ber of triangles the greater the clustering of this node. 
The clustering coefficient, 7(G), is defined as the aver- 
age number of 3-cycles (i.e., triangles) divided by the 
total number of possible 3-cycles: 



7(G) - 1/A^ 



T, 



^ hih - l)/2 



,h>2 



where Ti is the number of 3-cycles for node i, ki is the 
degree of node i. We use the distribution of clustering 
coefficients C{k), which in fact is the distribution of 
the terms fc.cfc._ii'i/2 ^^ the overall summation. This 
definition of the clustering coefficient gives the same 
weight to each triangle in the network, irrespective of 
the distribution of the node degrees. 

5.4 Rich-Club 

The rich-club coefficient 4>{p) is the ratio of the num- 
ber of links in the component induced by the p largest- 
degree nodes to the maximum possible links p{p — l)/2 
where p = l...n are the first p nodes ordered by their 
non-increasing degrees in a network of size n nodes [8] . 

5.5 Shortest path 



The shortest path length distribution P(h), as com- 
monly computed using Dijsktra's algorithm, is the prob- 
ability distribution of two nodes being at minimum dis- 
tance h hops from each other. From the shortest path 
length distribution, the average node distance in a con- 
nected network is derived as h = J2h=T hP{h), where 
/imax is the longest among the shortest paths between 
any pair of nodes, /imax is also referred to as the diam- 
eter of a network. 

5.6 Centrality 

Betweenness centrality is a measure of the number of 
shortest paths passing through a node or link, a cen- 
trality measure of a node or link within a network. The 
node betweenness for a node v is B{v) = '^s^v^teJV "^j 
where ast is the number of shortest paths from s to i 
and (Jstiv) is the number of shortest paths from s to i 
that pass through a node v [T7] . The average node be- 
tweenness is the average value of the node betweenness 
over all nodes. 

Closeness is a another measure of the centrality of a 
node within a network. The closeness of a node is the 
reciprocal of the sum of shortest paths from this node 
to all other reachable nodes in a graph. 

5.7 Coreness 

The ^-core of a network (also known as the fc-core) 
is the maximal component in which each node has at 
least degree /. In other words, the ^-core is defined as the 
component of a network obtained by recursively remov- 
ing all nodes of degree less than I. A node has coreness 
I if it belongs to the ^-core but not to the {I + l)-core. 
Hence, the Z-core layer is the collection of all nodes hav- 
ing coreness / . The core of a network is the l-coie such 
that the {I + l)-core is empty [4]. 

5.8 Clique 

A clique in a network is a set of pairwise adjacent 
nodes, i.e. a component which is a complete graph. The 
top clique size, also known as the graph clique number, 
is the number of nodes in the largest clique in a net- 
work [34]. 

5.9 Spectrum 

Recently, it has been observed that eigenvalues are 
closely related to almost all critical network charac- 
teristics [7]. For example, Tangmunarunkit et al. j31j 
classified network resilience as a measure of network ro- 
bustness subject to link failures, resulting in a minimum 
balanced cut size of a network. Spectral graph theory 
enables studying this problem of network partitioning 
by using graph's eigenvalues [7|. In this paper we fo- 
cus on graph's spectrum, i.e. the set of eigenvalues of 
the adjacency, the Laplacian or any other characteristic 
matrix of a graph. In the graph theory literature, one 



usually considers the adjacency or the Laplacian ma- 
trix |241 [9] , both which employ different normalizations 
and therefore lead to different spectra. Here we focus 
on the spectrum of the normalized Laplacian matrix [7] 
where all eigenvalues lie between and 2, allowing easy 
comparison of networks of different sizes. The normal- 
ized graph's spectrum has been successfully used for 
tuning the topology generators [14]. 

6. MEASURES OF SIMILARITY 

To compare the distributions of various metrics we 
use the following statistics to determine how close two 
distributions are to each other. We perform the calcu- 
lations for each synthetic topology instance separately 
and compare them to observed topologies of the same 
size. Note that distances are relative to the metric and 
the topology size, so that distances of one metric for a 
particular sized topology cannot be compared either to 
distances of another metric for the same sized topology, 
or to distances for the same metric for different sized 
topologies. 

6.1 Kolmogorov-Smirnov (KS) distance 

Given samples of two random variables, Xi and X2, 
the KS distance is the maximum empirical distribution 
difference defined as: 

Dmax = sup \Fn^{x) - Fn^{x)\ 

where Fn{x) is the empirical distribution of Xi{i = 
1,2): 

rii 

where rii and n2 are the number of samples from Xi 
and X2 and Ix is the indicator function. 

The closely related 2-sample KS test tests the null 
hypothesis that Xi and X2 share a (true) common dis- 
tribution based on the KS distance (Dmax)- However, 
it is misleading to use this test to indicate whether two 
distributions are similar, as it is highly sensitive to large 
sample sizes, and also as the particular xi and X2 com- 
pared here are not strictly independent variables since, 
e.g., nodes with high degrees tend to occur together. 
Instead Dmax alone is used in this paper to indicate the 
relative closeness of distributions. 

6.2 KuUback-Leibler divergence 

The KuUback-Leibler (KL) divergence is also pro- 
posed as a suitable metriqfl for comparing network dis- 
tributions. The KL divergence between two discrete 
random variables Xi and X2 is defined as: 

Dkl{X^,X2)^E^ P{Xi - X,)log ^[^lZxo 



*The KL divergence is not strictly a metric as 

DKL{Xi,X2)7^DKLiX2,Xi) 



where P{x) is the probability of x. 

The KL divergence takes into account the difference 
between the distributions at all points rather than sim- 
ply at the maximum point. In this paper, Gaussian ker- 
nel density estimation using fixed bins centered around 
data in the observed data set were found to perform 
well, although other methods do exist. 

7. RESULTS AND DISCUSSION 

Most past comparisons of topology generators have 
been limited to the average node degree, the node de- 
gree distribution and the joint degree distribution. The 
rationale for choosing these metrics is that if those prop- 
erties are closely reproduced, then the value of other 
metrics will also be closely reproduced ^9\ . 

In this section we show that current topology genera- 
tors are able to match first and second order properties 
well, i.e., average node degree and node degree distribu- 
tion, but fail to match many other topological metrics. 
We also discuss the importance of various metrics in our 
analysis. 

7.1 Methodology 

For each generator we specify the required number of 
nodes and generate 10 topologies of that size in order 
to provide confidence intervals for the metrics. We then 
compute the values of the metrics introduced in Sec- 
tion [5] for the generated and observed AS topologies. 
It is important to note that all topologies studied in 
this paper are undirected, preventing us from consid- 
ering peering policies and provider-customer relation- 
ships. This limitation is forced upon us by the design 
of the generators as they do not take such policies into 
account. 

Each topology generator uses several parameters, all 
of which could be tuned to best fit a particular size 
topology, e.g., the Skitter dataset. However, there are 
two problems with attempting this tuning. First, do- 
ing so requires selection of an appropriate goodness-of- 
fit measure of which there are many, e.g., as noted in 
Section [5l Second, in any case tuning parameters to 
a particular dataset is of questionable merit since, as 
we argue in Section [2 each dataset is only a sample 
of reality with multiple biases and inaccuracies. Typi- 
cally topology generator parameters are tuned so as to 
match the number of links in the synthetic and mea- 
sured networks, for a given number of nodes. However 
we discovered this method to be inefhcient as generat- 
ing graphs with equal numbers of links from a random 
model and a power-law model gives completely different 
outputs. For space reasons we dealt with this particular 
issue elsewhere |14j and in this paper we simply use the 
default values embedded within each generator. 

7.2 Topological metrics 
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0.001 


0.315 


GLP 


16,744 


3.64 


2,411 


2 


34,853,544 


5 


-0.089 


0.003 


0.496 


INET 


18,504 


4.02 


1,683 


3 


15,037,631 


7 


-0.195 


0.004 


0.514 


PFP 


27,611 


6 


3,000 


16 


13,355,194 


24 


-0.244 


0.012 


0.588 


Route Views 


40,805 


4.7 


2,498 


9 


30,171,051 


28 


-0.19 


0.02 


<0.01 


Waxman 


52,336 


6 


35 





1,185,687 


4 


0.205 


0.001 


0.25 


BA 


34,889 


4 


392 


3 


33,178,669 


2 


-0.04 


0.001 


0.33 


GLP 


31,391 


3.6 


4,226 


4 


127,547,256 


6 


-0.08 


0.002 


0.48 


INET 


43,343 


4.97 


2,828 


6 


31,267,607 


14 


-0.258 


0.006 


0.522 


PFP 


52,338 


6 


4,593 


23 


39,037,735 


30 


-0.252 


0.009 


0.564 


UCLA 


116,275 


8.05 


4,393 


10 


76,882,795 


73 


-0.165 


0.05 


0.32 


Waxman 


86,697 


6 


40 





3,384,114 


4 


0.213 


<0.001 


0.246 


BA 


57,795 


4 


347 





52,023,288 


2 


-003 


<0.001 


0.3 


GLP 


52,456 


3.63 


7391 


2 


371,651,147 


6 


-0.08 


<0.001 


0.486 


INET 


91,052 


6.3 


6,537 


12 


88,052,316 


38 


-0.3 


0.01 


0.55 


PFP 


86,696 


6 


8076 


26 


123,490,676 


40 


-0.218 


0.01 


0.57 



In this section we discuss the results for each met- 
ric separately and analyze the reasons for differences 
between the observed and the generated topologies. 

Table[T]displays the values of various metrics (columns) 
computed for different topologies (rows) . Blocks of rows 
correspond to a single observed topology and the gen- 
erated topologies with the same number of nodes as 
the observed topology. Bold numbers represent nearest 
match of a metric value to that for the relevant ob- 
served topology. Rows in each block are ordered with 
the observed topology first, followed by the generated 
topologies from oldest to newest generator. For syn- 
thetic topologies, the value of the metrics is averaged 
over the 10 generated instances. Note that Inet requires 
the number of nodes to be greater than 3037 and hence 
cannot be compared to the Chinese topology. 

We observe a small but measurable improvement from 
older to newer generators in how well they match some 
metrics such as maximum degree, maximum coreness, 
and assortativity coefficient. This suggests that topol- 
ogy generators have been successively improved to bet- 
ter match some properties of the observed topologies. 
However, the number of links in the generated topolo- 
gies may differ considerably from the observed topology 
due to the assumptions made by the generators. The 
Waxman and BA generators fail to capture the maxi- 



mum degree, the top clique size, maximum betweenness 
and coreness. Those two generators are too simplistic 
in the assumptions they make about the connectivity 
of the graphs to generate realistic AS topologies. Wax- 
man relies on a random graph model which cannot cap- 
ture the clique between tier-1 ASs nor the heavy tail of 
the node degree distribution. BA tries to reproduce the 
power-law node degrees with its preferential attachment 
model but fails to reach the maximum node degree, as 
it only adds edges between new nodes and not between 
existing ones. Hence, neither of these two models is able 
to create the highly-connected core of tier-1 ASs. PFP 
and Inet manage to come closer to the values of the 
metrics of the observed topologies. For Inet this is be- 
cause it assumes that 30% of the nodes are fully meshed 
(at the core), whereas for PFP its rich-club connectivity 
model allows to add edges between existing nodes. 

7. 2. 1 Node degree distribution 

In Figure [2] we show the CCDF of the node degree 
for all topologies on a log-log scale. We observe that 
the Chinese topology does not exhibit power law scal- 
ing due to its limited size, whereas all the larger AS 
topologies do exhibit power-law scaling of node degrees. 
The Waxman generator completely fails to capture this 
behavior as it is based on a random graph model, but 



recent topology generators do capture this power law 
behavior of the node degrees quite well. In the case 
of the RouteViews and UCLA datasets, Inet and PFP 
outperform other topology generators. Note that, con- 
trary to RouteViews where the degree distribution dis- 
plays strict power law scaling, the UCLA dataset has a 
slightly concave shape. In summary, more recent gen- 
eration models reproduce node degree distribution well, 
as expected since most focus has been on this metric. 
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Figure 2: Comparison of node degree CCDFs. 



7. 2. 2 Average neighbor connectivity 
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Figure 3: Comparison of average neighbor con- 
nectivity CCDFs. 

Neighbor connectivity has been far less studied than 
node degree, although it is very important to match lo- 
cal interconnection among a node's neighbors when re- 
producing the topological structure of the Internet [2^ . 
Figure [3] shows the CCDF of the average neighbor de- 
grees for all topologies. We observe that Waxman, BA 
and GLP all underestimate the local interconnection 
structures around nodes due to their way of modeling 



node interconnections. Note that BA and GLP typi- 
cally generate graphs with far less links than the ob- 
served topologies so they underestimate neighbor de- 
grees on average. 

For the larger topologies, i.e. RouteViews and UCLA, 
PFP and Inet typically overestimate the neighbor con- 
nectivity, as they both place a large number of inter- As 
links in the core. In addition, the shapes of the neigh- 
bor connectivity CCDF differ for the larger topologies: 
Inet and PFP have two regimes, one for highly con- 
nected nodes (those with larger neighbor connectivity), 
and another for low-degree nodes. On the other hand, 
observed topologies have a smooth region for the high- 
degree nodes followed by a rather stable region caused 
by similar degree nodes. We observe that the highest 
degree nodes in the UCLA topology have very high val- 
ues of neighbor connectivity. This is consistent with 
the belief that tier-1 providers are densely meshed. In 
summary, existing topology generators do not repro- 
duce local interconnection behavior well, even though 
it is an important aspect of today's AS topology. 

7.2.3 Clustering coefficients 

Like the average neighbor connectivity, the cluster- 
ing coefficient gives information about local connectiv- 
ity of the nodes. It is important to reproduce clustering 
due to its impact on the local robustness in the graph: 
nodes with higher local clustering have increased local 
path diversity [20^ . Clustering properties of a graph can 
directly affect simulations on performance of multipath 
and resilience of overlay routing. 




Figure 4: Comparison of clustering coefRcients. 

Figure|4]displays the clustering coefficients of all nodes 
in the topologies. Error bars indicate 95% confidence 
intervals around the mean values of the 10 topologies 
from each generator. We observe that Waxman and BA 
significantly underestimate clustering, consistent with 
their simplistic way of connecting nodes. GLP approxi- 
mates the clustering of the Chinese topology quite well 



but fails in the case of the larger observed topologies. 
PFP and Inet capture clustering reasonably well com- 
pared to the other topology generators. However, Inet 
does not reproduce the tail of the distribution well due 
to the randomness factor in its model for edge addition 
once the core is fully meshed. 

We also observe that for medium degree nodes, clus- 
tering coefficients display rather high variability which 
increases with the size of the observed topologies. This 
behavior seems to be a property of the observed AS 
topology of the Internet (Section [5]), and not only an 
artifact of the incompleteness of observed AS topolo- 
gies. 

In summary, all topology generators fail to properly 
capture clustering, typically underestimating local con- 
nectivity. Only Inet for the UCLA topology overesti- 
mates connectivity of low-degree nodes while still un- 
derestimating it for high-degree nodes. Current topol- 
ogy generators do not seem to have good models of local 
node connectivity. 

7.2.4 Rich-club connectivity 
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Figure 5: Comparison of rich-club connectivity 
coefficients 

Rich-club connectivity gives information about how 
well-connected nodes of high degree are among them- 
selves. Figure [S] makes it clear that the cores of the 
observed topologies are very close to a full mesh, with 
values close to 1 on the left of the graphs. The error bars 
again indicate the 95% confidence intervals around the 
mean values of the different instances of the generated 
topologies. Waxman and BA perform poorly for this 
metric in general. Only PFP and Inet generate topolo- 
gies with a dense enough core compared to the observed 
topologies. Given the emphasis that PFP gives to the 
rich-club connectivity, it overestimates it in the case of 
the Chinese and RouteViews topologies. Inet performs 
well due to its emphasis on a highly connected core, 
especially for larger topologies where data has been col- 



lected across multiple peering points. 

In summary, most topology generators underestimate 
the importance of rich-club connectivity of the AS topol- 
ogy. PFP is the only topology generator that empha- 
sizes the importance of the dense core of the AS topol- 
ogy- 

7.2.5 Shortest path distributions 




Figure 6: Comparison of shortest path distribu- 
tions (number of hops). 

Figure [6] displays the distributions of shortest path 
length. Apart from BA, most topology generators ap- 
proximate the shortest path length distribution of the 
Chinese graph quite well due to its small size. For the 
other topologies, PFP and Inet generally underestimate 
the path length distribution while Waxman and BA 
overestimate. Particular generators seem to capture the 
path length distribution for particular topologies well: 
PFP matches that for Skitter well and GLP is close for 
Routeviews. Inet and PFP both do a better job for 
UCLA than for RouteViews but both still underesti- 
mate the distribution. 

In summary, shortest path length is not well captured 
by any topology generator. Given the poor match of 
generators to local connectivity metrics, this is not sur- 
prising. 

7.2.6 Spectrum 

The spectrum of the normalized Laplacian matrix is 
a powerful tool for characterizing properties of a graph. 
If two graphs have the same spectrum, they have the 
same topological structure. 

Figure [7] displays the CDF of the eigenvalues com- 
puted from the normalized Laplacian matrix of each 
topology. 

As with other topological metrics, Inet and PFP per- 
form best. The difference between the topology gen- 
erators is most easily observed around the eigenvalues 
equal to 1. These eigenvalues play a special role as they 
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Figure 7: Comparison of cumulative distribu- 
tions of eigenvalues (from normalized laplacian). 

indicate repeated duplications of topological patterns 
within the network. By duplication, we mean different 
nodes having the same set of neighbors giving their in- 
duced subgraphs the same structure. Through repeated 
duplication, one can create networks with high multi- 
plicity of eigenvalue 1 [2]. Further, if a network is bi- 
partite, i.e., it consists of two connected parts with no 
links between nodes of the same part, then its spectrum 
will be symmetric about 1. This phenomenon can also 
arise through repeated structure duplication 

We observe that the spectra have a high degree of 
symmetry around the eigenvalue 1 , and so the observed 
AS topologies appear close in spectral terms to a bi- 
partite graph. In the AS topology many ASs share 
a similar set of upstream ASs without being directly 
connected to each other. Inet and PFP are good exam- 
ples of topology generators where this strategy is imple- 
mented. Note that the simple preferential attachment 
model of BA does not reproduce the eigenvalues around 
1 very well. In the simple BA model, new nodes connect 
randomly to a given number of existing nodes, favoring 
connections to high degree nodes. In the Internet in 
contrast, although small ASs may tend to connect to 
large upstream providers, they might not connect pref- 
erentially to the largest ones, connecting instead to na- 
tional or regional providers. In summary, these results 
provide further evidence that the interconnection struc- 
ture of the AS topology is more complex than current 
models assume. 

7.3 Measures of similarity 

In Section 17. 2i we presented visual evidence for the 



tween generators and observed topologies. In this sec- 
tion we present a more objective approach, based on the 
statistical distance metrics described in Section [6] the 
Kolmogorov-Smirnov (KS) distance and the KuUback- 
Leibler (KL) divergence. 

In the following tables, the values of the distances 
and the standard deviations are shown for the topolog- 
ical metrics with distributions: node degree, neighbor 
connectivity, clustering coefficient, and rich-club coeffi- 
cient. We provide the average values of the statistical 
distances and the standard deviation around the aver- 
age over the 10 topologies generated by each topology 
generator. When no deviation is shown, it was < 0.01. 

Table 2: Statistical distances for Chinese vs. 
synthetic topologies. 





Node 


degree 


Neighbor connectivity | 




KS 


KL 


KS 


KL 




distance 


divergence 


distance 


divergence 


Waxman 


0.27±0.07 


0.6±0.1 


0.75±0.03 


27.4±4.1 


BA 


0.12±0.03 


3.5±1.8 


0.74±0.07 


18.4±8.1 


GLP 


0.24±0.08 


0.64±0.31 


0.41±0.08 


L18±0.72 


PFP 


0.17±0.04 


1.45±0.48 


0.51±0.07 


0.85±0.25 




Clus. Coefficients 


Rich-Club Coefficients | 




KS 


KL 


KS 


KL 




distance 


divergence 


distance 


divergence 


Waxman 


0.61±0.03 


22.31±4.5 


0.22±3.5 


4.2±2.8 


BA 


0.65±0.1 


13.5±5.2 


0.28±0.01 


2.78±L4 


GLP 


0.31±0.05 


1.08±.6 


0.26±0.04 


0.34±0.19 


PFP 


0.32±0.11 


0.34±0.14 


0.12±0.01 


0.11±0.02 



(dis) similarity both among topology generators and be- 



Both statistical measures globally confirm the visual 
inspection of Section 17.21 more recent topology gener- 
ators produce topologies whose properties are closer to 
the observed topologies. Table [2] provides the KS and 
KL results for topology generators against the Chinese 
topology for the four chosen topological metrics. Topol- 
ogy generators do not show improvement for the node 
degree. However, for the other three metrics succes- 
sive topology generators do show improvement. Over- 
all, the PFP and GLP model both have small relative 
distances to the Chinese dataset, due to the small size 
of the dataset, the presence of high degree nodes and 
fewer inter- AS connections. 

Table [3] displays the results of the statistical measures 
for results against the Skitter topology. We observe a 
particularly good match of the node degree distribu- 
tion by Inet. PFP outperforms all other topology gen- 
erators for the clustering coefficients and the rich-club 
coefficients, consistent with the visual inspection. 

Statistical distances for RouteViews (Table |4]) show 
that Inet again better matches the node degree distribu- 
tion. GLP and Inet both perform better than other gen- 
erators for neighbor connectivity. PFP performs better 
than the others on the clustering coefficients. On the 
other hand, none of the generators manages to obtain a 
close distance for the rich-club coefficients. In Figure [5l 
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Table 3: Statistical distances for Skitter vs. 
synthetic topologies. 





Node degree 


Neighbor connectivity | 




KS 


KL 


KS 


KL 




distance 


divergence 


distance 


divergence 


Waxman 


0.54±0.04 


2.27±0.15 


0.99±0.01 


44.48±0.08 


BA 


0.41±0.02 


17.1±2.6 


0.99±0.01 


44.7±0.25 


GLP 


0.31±0.06 


17.42±4.1 


0.31 


2.16 


Inet 


0.075±0.02 


4.13 


0.40±0.02 


1.82±0.31 


PFP 


0.13±0.03 


18.2±2.31 


0.13±0.05 


18.2±2.21 




Clust. Coefficients 


Rich-Club Coefficients | 




KS 


KL 


KS 


KL 




distance 


divergence 


distance 


divergence 


Waxman 


0.91±0.02 


40.62±1.2 


0.2±0.05 


6.75±1.3 


BA 


0.9±0.05 


44.62±0.12 


0.37±0.09 


7.34±1.21 


GLP 


0.7±0.02 


19.12±1.8 


0.3±0.01 


4.34dz.45 


INET 


0.74±0.01 


11.34±1.23 


0.25 


3.82±0.2 


PFP 


0.09±0.02 


0.59±0.19 


0.03 


0.91±0.14 



Table 5: Statistical distances for UCLA vs. syn- 
thetic topologies. 





Node degree 


Neighbor connectivity | 




KS 


KL 


KS 


KL 




distance 


divergence 


distance 


divergence 


Waxman 


0.52±0.01 


1.33±0.9 


0.99±0.01 


46.31±1.3 


BA 


0.17±0.03 


2.15±0.8 


0.99±0.01 


46.42±0.7 


GLP 


0.18±0.05 


2.21±0.7 


0.32±0.03 


0.63±0.04 


Inet 


0.2±0.02 


5.34 


0.29±0.01 


0.41±0.01 


PFP 


0.12±0.03 


2.17±0.8 


0.48±0.05 


0.83±0.21 




Clust. Coefficients 


Rich-Club Coefficients | 




KS 


KL 


KS 


KL 




distance 


divergence 


distance 


divergence 


Waxman 


0.93±0.02 


44.2±0.34 


0.31 


14.5±4.32 


BA 


0.99±0.01 


45.42 


0.5 


14.32±2.3 


GLP 


0.82±0.01 


33.32±0.9 


0.42±0.01 


8.9±1.2 


INET 


0.38±0.01 


0.53±0.01 


0.13 


2.85±0.12 


PFP 


0.38±0.02 


0.79±0.15 


0.16 


3.23±0.4 



Table 4: Statistical distances for RouteViews vs. 
synthetic topologies. 





Node 


degree 


Neighbor connectivity | 




KS 


KL 


KS 


KL 




distance 


divergence 


distance 


divergence 


Waxman 


0.5±0.03 


50.77±0.01 


0.94±0.01 


42.68±0.25 


BA 


0.2±0.02 


50.74±0.01 


0.94±0.01 


42.91±0.34 


GLP 


0.18±0.03 


50.73±0.01 


0.12±0.02 


0.1±0.02 


Inet 


0.07 


9.92 


0.23±0.02 


0.2±0.01 


PFP 


0.11±0.03 


50.7 


0.62±0.02 


1.25±0.07 




Clust. Cc 


jefficients 


Rich-Club 


Coefficients 




KS 


KL 


KS 


KL 




distance 


divergence 


distance 


divergence 


Waxman 


0.83±0.05 


39.4±1.2 


0.97 


42.23±0.43 


BA 


0.96±0.01 


44.08±0.21 


0.97 


43.07±0.6 


GLP 


0.58±0.02 


12.9±0.65 


0.96 


40.7±0.9 


INET 


0.39±0.01 


1.35±0.2 


0.93 


34.18±1.1 


PFP 


0.32±0.06 


0.21±0.03 


0.92 


27.4±2.45 



Inet seemed to be close to RouteViews for rich-club co- 
efficients, but this is not supported by the statistical 
distances. The behavior for rich-club connectivity is 
surprising, especially for PFP which is highly biased 
towards reproducing rich-club connectivity. We believe 
this is due mainly to the addition of many extra peering 
links in this dataset, which was not captured by model 
designers. 

Statistical distance results for UCLA (Tabled reveal 
a more complex picture. For node degrees, no genera- 
tor seems to outperform the others, although Inet does 
perform worst. GLP, Inet and PFP perform equally 
well on the neighbor connectivity. For clustering coeffi- 
cients and rich-club connectivity, Inet and PFP perform 
better than the others. 

Visual inspection of Section 17.21 seemed to suggest 
that each successive topology generator introduced im- 
provements in their matching of observed AS topolo- 
gies. Waxman and BA perform poorly both in visual 
inspection and in the statistical distances. The KL di- 



vergences clarify the difference of the two distributions 
across all the values and hence minimize the effects of 
local differences at certain values. 

Our statistical measures show that apparent visual 
closeness of two distributions does not mean close dis- 
tance in distributional terms, due partly to the use 
of logarithmic scale axes. Improvements in successive 
topology generators are not consistent across all metrics 
and across all observed topologies. Nonetheless, most of 
the time the most recent generators, Inet and PFP, do 
outperform the other topology generators. This indi- 
cates that more attention should be given on capturing 
the effects of peering links in the core and at the edge 
of the AS topology, as this is the significant difference 
between these two generators and the older Waxman 
and BA generators. 

8. MULTIPLE VANTAGE POINTS 

The previous section studied in detail how well topol- 
ogy generators capture the properties of observed AS 
topologies. In this section, we will argue about why 
topology generators capture different propeties of ob- 
served AS topologies with varying degrees of success. 
To that end we examine the impact on the metrics of 
the number of vantage points from which BGP data is 
collected. For our analysis we collected BGP data from 
over 40 RouteViews peering points, for a period of 6 
months from May 2007. This time period was chosen 
to be the same as that used to build the UCLA dataset. 

Zhang et al. |36j also examine the impact of the se- 
lection of route monitors on topology visibility and the 
consequences on AS relationship inference and AS-level 
path prediction. They analyze a range of monitor se- 
lection schemes and their influence on the number of 
observed links as well as network properties. They sug- 
gest that the accuracy of AS relationship inference may 
decrease as the number of monitors increases, and go on 
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to quantify the improvements in identifying AS relation- 
ships and anomaly detection in the data. In contrast, 
our work focuses on understanding the underlying ef- 
fect placement of vantage points has on inferring both 
the network topology and its associated dynamics. We 
are also interested in examining the distortion of local 
topological properties by using a different number of 
vantage points. 

Table [6] shows the values of the topological metrics 
the same way as in Table [U for AS topologies obtained 
from different numbers of observation points. When 
comparing the AS topologies using 1 and 10 observa- 
tion points, we see a significant increase in the number 
of nodes and links. Hence, one might also expect a 
significant difference in the other metrics, and indeed, 
the maximum node degree almost triples and the num- 
ber of fully-meshed nodes almost doubles. As a conse- 
quence, the size of the core increases, indicated by the 
maximum coreness value. In turn, the number of short- 
est paths crossing the core also increases as indicated 
by the maximum betweenness. On the other hand, we 
see that going from 1 to 10 observation points slightly 
decreases the value of the clustering coefficient. Most 
probably this is because with 10 observation points we 
discover more of the core than the edge of the network, 
which does not contribute to increase the overall value 
of the clustering coefficient. With 25 or more observa- 
tion points the links on the edge of the network are also 
discovered, contributing to the increase of the value of 
the clustering coefficient. This behavior is confirmed by 
a slight decrease of the value of the maximum between- 
ness from 10 to 25 observation points. 

Preferential attachment models originate in the be- 
lief that small ASs tend to connect to large upstream 
ASs, leading to a disassortative network. Although the 
value of the assortativity coefficient is negative for the 
AS topology, it is not affected by an increase in the 
number of observation points. The links added by in- 
creasing the number of observation points seem to be 
neutral for the assortativity of the AS topology. One 
implication is that the links that can be discovered by 
using more observation points do not preferentially in- 
terconnect ASs of any particular degree. We conjec- 
ture that this is due to the type of peering relationships 
that are missed. If node degrees give an indication of 
the likely type of peering relationship, then we suggest 
that BGP does not preferentially miss peer-peer rela- 
tionships, which are believed to be more difficult to ob- 
serve that customer-provider ones [6]. 

We now turn in more detail to the effect of the num- 
ber of peering points on four particular topological met- 
rics (see Figure E]). The addition of observation points 
mostly affects node degree distribution for high degree 
nodes. As we increase the number of observation points, 
we see that on average the neighbors of a node will have 
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Figure 8: Comparison of effects of the number 
of peering points. 

a higher degree. However, this docs not hold for nodes 
whose neighbors already have high degrees (left part of 
FigureO Those nodes correspond to stub networks con- 
nected to very well interconnected upstream providers. 
For the clustering coefficient, when moving from one to 
several observation points, the difference is striking. For 
all node degrees, the clustering coefficient significantly 
increases. On the other hand, when moving from a few 
peerings to many, the difference appears most for high 
degree nodes. This illustrates the better observability of 
links in the core compared to the edge of the network. 
Rich-club connectivity confirms the previous observa- 
tions in that adding a few observation points is enough 
to discover the core links. 

In this section we have illustrated the importance 
of relying on a sufficiently large number of observation 
points in order to properly capture the actual proper- 
ties of the AS topology. Using only a few observation 
points has led researchers to simplify the complexity 
of the interconnection structure between ASs. The im- 
proper AS topology on which researchers have relied has 
caused the creation of topology generators that underes- 
timate this interconnection structure between ASs. Our 
results show that researchers must use rich datasets for 
an accurate understanding of the Internet AS topology. 

9. CONCLUSIONS 

In this paper we evaluated the existing AS topology 
generation models, by comparing them with several ob- 
served AS topologies. For this evaluation, we relied on a 
wide set of topological metrics and statistical measures 
to carry our comparison as objectively as possible. Our 
analysis revealed that: 

• Increasing the number of observation points causes 
deviation from strict degree power-law scaling. Ex- 
isting topology generation models overemphasize 
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Table 6: Comparison of AS topology datasets from multiple peering points. 



Topology 


Nodes 


Links 


Avg. dcg. 


Max. 


Top clique 


Max. 


Max. 


Assort. 


Clust. 


Max. 










degree 


size 


betweenness 


corcncss 


coef. 


cocf. 


closeness 


1 peer 


17,952 


34,617 


3.86 


980 


4 


35,069,182 


9 


-0.18 


0.008 


<0.01 


10 peers 


27,838 


64,717 


4.65 


2,731 


7 


52,862,315 


20 


-0.18 


0.007 


<0.01 


25 peers 


27,885 


67,659 


4.85 


2,808 


7 


49,798,002 


25 


-0.19 


0.01 


<0.01 


All peers 


27,924 


70,064 


5.02 


3,371 


7 


70,142,726 


30 


-0.18 


0.01 


<0.01 



the preferential attachment mechanism and the 
resulting node degree distribution. The power- 
law assumption is thus an artefact of incomplete 
datasets, rather than a property of the AS-level 
topology. 

• In addition to clustering and centrality properties, 
the highly meshed core of the Internet AS topology 
must be considered in order to generate represen- 
tative synthetic topologies. 

• The successive improvements in topology genera- 
tion models seems to result from improved avail- 
able datasets. Knowing that incomplete datasets 
were the cause for simplistic topology generation 
models, we expect that the new generation of topol- 
ogy models will take into account the insights gained 
in this paper. 

Our findings indicate that future work in this area 
should consider the geographical extent of the AS graphs, 
the AS sizes, multiple peerings between ISPs, policy 
routing and topology dynamics. Future AS topology 
generators should also permit the addition of metadata 
such as peering relationships and relative importance of 
nodes. 
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